단어 임베딩 분석을 통한 신경망 기계 번역 시스템의 성능 비교: 한국어-일본어, 한국어-영어를 중심으로

최용석; 박요한; 윤승; 김상훈; 이공주; Seung Yun; Sanghun Kim; Kong Joo Lee; Yong-Seok Choi; Yo-Han Park

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국정보과학회 논문지 > 정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

Current Result Document :

한글제목(Korean Title)	단어 임베딩 분석을 통한 신경망 기계 번역 시스템의 성능 비교: 한국어-일본어, 한국어-영어를 중심으로
영문제목(English Title)	A Performance Comparison of Neural Machine Translation Systems through Vocabulary sets and Word Embeddings: Focus on Korean-English and Korean-Japanese
저자(Author)	최용석 박요한 윤승 김상훈 이공주 Seung Yun Sanghun Kim Kong Joo Lee Yong-Seok Choi Yo-Han Park
원문수록처(Citation)	VOL 28 NO. 02 PP. 0081 ~ 0088 (2022. 02)
한글내용 (Korean Abstract)	본 연구에서는 MASS를 이용해 사전 학습 모델을 구축하고 병렬 데이터로 파인 튜닝하여 한국어-영어와 한국어-일본어 기계 번역 모델을 구축한다. 한국어, 일본어, 영어는 모두 다른 문자 체계를 사용한다. 한국어와 일본어는 주어-목적어-동사의 어순을 갖는 반면 영어는 주어-동사-목적어의 어순을 갖는다. 본 연구에서는 신경망 기반의 기계 번역을 구축할 때 두 언어 사이의 문자 체계를 공유하는 여부와 문장 어순의 유사성에 따른 기계번역의 성능을 평가해 보았다. 모델의 성능 차이를 단어 임베딩을 통해 분석해 보기 위해 어휘 번역 실험과 문장 검색 기계 번역 실험을 수행하였다. 실험 결과 인코더의 단어 임베딩이 디코더에 비해 훨씬 중요하고 한국어-영어보다는 한국어-일본어의 경우 더 좋은 성능을 발휘함을 알 수 있었다. 문장 검색 기계 번역 실험에서 한국어-영어의 경우에는 소량의 병렬 데이터만으로도 큰 폭의 성능 향상이 관찰되었다.
영문내용 (English Abstract)	In this study, we have pre-trained MASS models and built neural machine translation (NMT) systems for Korean-English and Korean-Japanese based on them. Korean, Japanese, and English use different writing scripts. Korean and Japanese are Subject-Object-Verb languages, while English is a Subject-Verb-Object language. In this study, we have evaluated the performances of NMT systems according to the similarity between languages, such as word order and writing scripts. To compare the performances of NMT models from the perspective of word embeddings, we have conducted the following two experiments: word translation and sentence translation retrieval using word embedding learned by NMT models. The accuracies of word translation and sentence translation for word embeddings of a Korean-Japanese NMT model were higher than those of a Korean-English pair. Moreover, the word embeddings learned by an encoder were more important than those learned by a decoder when used in NMT. Based on the result of sentence translation retrieval of a Korean-English NMT model, we found that a Korean-English unsupervise NMT model could be significantly improved when trained even with a small amount of parallel data.
키워드(Keyword)	MASS 기계 번역 단어 임베딩 문자 체계 SVO 순서 SOV 순서 MASS machine translation word embedding scripts SVO order SOV order
파일첨부	PDF 다운로드